SLIP: Self-supervision Meets Language-Image Pre-training

نویسندگان

چکیده

Recent work has shown that self-supervised pre-training leads to improvements over supervised learning on challenging visual recognition tasks. CLIP, an exciting new approach with language supervision, demonstrates promising performance a wide variety of benchmarks. In this work, we explore whether can aid in the use supervision for representation Vision Transformers. We introduce SLIP, multi-task framework combining and CLIP pre-training. After pre-training, thoroughly evaluate quality compare both under three distinct settings: zero-shot transfer, linear classification, end-to-end finetuning. Across ImageNet battery additional datasets, find SLIP improves accuracy by large margin. validate our results further experiments different model sizes, training schedules, datasets. Our findings show enjoys best worlds: better than self-supervision (+8.1% accuracy) (+5.2% accuracy). code is available at: github.com/facebookresearch/SLIP .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Acquisition Meets Language Evolution

Recent research suggests that language evolution is a process of cultural change, in which linguistic structures are shaped through repeated cycles of learning and use by domain-general mechanisms. This paper draws out the implications of this viewpoint for understanding the problem of language acquisition, which is cast in a new, and much more tractable, form. In essence, the child faces a pro...

متن کامل

On the Iranian In-service and Pre-service Language Teachers’ Perceptions of Educational Supervision Concerning their Professional Development

Teacher supervision plays a pivotal role in the improvement of education system and the way in which teachers and student teachers perceive it. Consequently language teacher supervisors can utilize appropriate supervisory models to keep teachers update and promote them professionally. The present study investigated the role of language teacher supervisors in student teachers and in-service teac...

متن کامل

Clinical supervision training across contexts.

BACKGROUND Clinicians require specific skills to teach or supervise students in the workplace; however, there are barriers to accessing faculty member development, such as time, cost and suitability. The Clinical Supervision Support Across Contexts (ClinSSAC) programme was designed to provide accessible interprofessional educator training to clinical supervisors across a wide range of clinical ...

متن کامل

Minimal Supervision for Language Learning

A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children begin in solving this problem when learning their first lan...

متن کامل

Language Generation with Recurrent Generative Adversarial Networks without Pre-training

Generative Adversarial Networks (GANs) have shown great promise recently in image generation. Training GANs for text generation has proven to be more difficult, because of the non-differentiable nature of generating text with recurrent neural networks. Consequently, past work has either resorted to pre-training with maximumlikelihood or used convolutional networks for generation. In this work, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19809-0_30